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Abstract 

An alphabetic binary tree formulation applies to problems in which an outcome 
needs to be determined via alphabetically ordered search prior to the termina- 
tion of some window of opportunity. Rather than finding a decision tree minimiz- 
ing X^r=i w (^(*)> this variant involves minimizing log a XaLi w(i)a 1 ^ for a given 
a € (0,1). This note introduces a dynamic programming algorithm that finds the 
optimal solution in polynomial time and space, and shows that methods tradition- 
ally used to improve the speed of optimizations in related problems, such as the 
Hu- Tucker procedure, fail for this problem. This note thus also introduces two ap- 
proximation algorithms which can find a suboptimal solution in linear time (for 
one) or 0(n log n) time (for the other), with associated coding redundancy bounds. 
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1 Introduction 



Applications such as searching [8] and coding theory [6] make extensive use of 
binary trees. We denote the length (number of edges) of a path from the root 
to node i £ {1, 2, . . . , n} of the tree as and the weight (usually probability) 
of the leaf as w(i). Given a set of weights, Huffman's algorithm [6] finds a tree 
minimizing cost function 



X>(i)/(i) 



and Hu and Tucker's algorithm [5] finds an optimal alphabetic tree: 
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Definition 1 An alphabetic tree is a tree with leaves are in numerical order 
given inorder tree traversal (i.e., 1,2, ... ,n from left to right). 

Three papers independently considered the problem of minimizing 



for a > 1 [5, p. 254] [10, p. 485] [7, p. 231] for unconstrained (Huffman- 
like) minimization, the solution of which is very similar to that of Huffman's 
algorithm. One of these further noted that an algorithm similar to Hu and 
Tucker's solves the alphabetically constrained version of this problem [5], while 
another noted that the Huffman-like solution also solves the unconstrained (J2]) 
for a < 1 [7], in which log a x is monotonically decreasing and the objective's 
summation term is thus maximized. 

A recent paper showed that the a < 1 problem describes certain situations 
of single-shot decision-making [1]. Given a window of time corresponding to 
a memoryless random variable, if we wish to find the leaf of the binary tree 
through constant-time edge traversal, this is found in time with probability 
a L a (w,i) — w hich we thus wish to minimize — for some known a < 1. However, 
solving the alphabetic version of this problem remained unaddressed. 

Here we present an 0(n 3 ) algorithm for minimizing (T5]) that is somewhat sim- 
ilar to Gilbert and Moore's method for ([1]) [4]. We then introduce counterex- 
amples on attempts to minimize using faster methods, such the modification 
of Hu and Tucker's, which only succeeds for a > 1. Finally we present ap- 
proximation algorithms, related to those for the linear problem, which find 
suboptimal solutions in 0{n) and O(nlogn), leading to simple bounds for 
both these solutions and the optimal ones. 



2 Optimal Alphabetic Trees 

Because the alphabetic tree imposes leaf order, each decision of which child to 
take, represented by a (for left) or 1 (for right), is equivalent to a question 
of the form, "Is the output greater than or equal to s?" where s is one of the 
possible symbols, a symbol we call the splitting point: 

Definition 2 The splitting point of an internal node (or the corresponding 
subtree) is the smallest index among the leaves of the right subtree. 

Definition 3 Each codeword c(i) is the sequence of bits corresponding to the 
sequence of decisions (path) to arrive at leaf i. The overall set of codewords 




(2) 



i=i 



2 



- alphabetic code C — fully describes the tree, as does length vector I, the 
sequence of lengths {l{i)}- 

The dynamic programming approach of Gilbert and Moore [4] is adapted to 
this problem ([2]): 

Theorem 1 An algorithm finds the maximum tree weight Wj t k (and corre- 
sponding optimum tree) for items j through k for each value of k — j from 
to n — 1 (in order), by computing inductively 

W jt k <- amax ae { j+ltj+ 2 > ...,k}[Wj ja - 1 + W 8 , k ] starting with W jtj <— w(J) (3) 
for l<j<k<nin 0(n 3 ) time and 0(n 2 ) space. 

Proof Recall first that maximizing a La ^ w ' 1 ' = YU w {i) a minimizes L a (w,l), 
which is why (E]) is a maximization operation. One can see that Wi >n = a La ( w ' 1 ' 
inductively by considering a (sub)tree's two subtrees as independent, rooted 
trees, one with summation Wi iS _i = Ylt=i w(i)a l ^~ l , the other with summa- 
tion W S!n = EiU^OV® -1 - Then W x , n = a(Wi )S _i + W„, n ). Starting with 
Wjj = w(j), then, we see that these values can be built up accordingly (since 
the path length from a leaf to itself is 0, Wjj = a°w(j), and there is noth- 
ing numerically special about the final tree). Since all subtrees of an optimal 
tree are optimal — via a substitution argument, e.g., [8] — the maximization 
finds an optimal solution. This suggests the dynamic programming algorithm; 
similarly to [8], calculating all optimal subtrees of a size less than that of 
the (sub)tree in a current step, we can try all possible splitting points using 
optimal subtrees, yielding the optimal tree. 

0(n 2 ) items are stored — 0(n 2 ) weights for every possible range and the as- 
sociated splitting points; these are used to recursively find the implied subtree 

- calculated by testing 0(n) splitting points for each internal node, thus the 
time and space complexity. ■ 

Knuth [8] reduced the algorithmic complexity of Gilbert and Moore's method 
for ([!]) by using the fact that the splitting point of an optimal tree of size 
n must be between the splitting points of the two optimal subtrees of size 
n — 1. With (121), this no longer holds. Consider a = 0.6 with input weights 
w = (8,1,9,6). The splitting point of (8,1,9) is s = 3 (w(s) = w(3) = 9, 
yielding subtrees with (8,1) and (9)), and the splitting point of (1,9,6) is 
s = 4 (w(s) = 6). However, the optimal splitting point of (8, 1,9,6) is s = 2 
(w(s) = 1). 

Similarly, for (J2J) with a > 1 [5], there is a procedure based on the Hu- Tucker 
algorithm for finding an optimal alphabetic solution. The Hu- Tucker algorithm 
begins with the input weights arranged as leaves in numerical order (1,2, ... ,n 
in a line). It then combines the two items i and j that, of all pairs of items 
without a leaf separating them, have a minimum weight sum, putting it in 
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the place of either node, both of which are now (ordered) children. In the 
original Hu- Tucker algorithm, this item is given weight w(i) + w(j), whereas 
for (J2]) with a > 1 it is given weight aw(i) + aw(j). The algorithm then finds 
the minimum weighted pair among those pairs of distinct items (uncombined 
input leaves and combined items) without any uncombined leaf between them, 
placing the resulting node in the place of either original node. Continuing on, 
we obtain a tree that is not necessarily alphabetical, but which has the same 
lengths as an alphabetic tree which can be easily reconstructed, (optimally) 
solving the problem (for a > 1). 

However, consider again a = 0.6, this time for weights (8, 1,9,6,2). The Hu- 
Tucker-like algorithm first combines 6 and 2, then 8 and 1, then the first 
combined node with 9, and finally the remaining two nodes, resulting in a 
tree with lengths I' = (2,2,2,3,3) and L a (w,l') ps —4.121. However, a tree 
with lengths I" = (1,3,3,3,3), having L a (w,l") « -4.232, shows that the 
Hu- Tucker-like solution is nonoptimal. 

Result 1 Knuth 's method for speeding up dynamic programming fails for a < 
1, as does using the Hu- Tucker-like method optimal for a > 1. 



3 Approximation Algorithms and Bounds 

In this section, we add the assertion J27=i w (0 = 1 to our problem, which can 
be considered an optimization of (|2J) with constraints: 

1. The Kraft inequality of binary trees, Y^=i 2~'^- ) < 1; 

2. The integer constraint, l(i) £ Z; 

3. The alphabetic constraint. 

The first and second of these are necessary and sufficient for the lengths to 
correspond to a binary tree. Relaxing the second and third allows for a numer- 
ical solution which can bound the performance of the optimal solution. The 
numerical solution, l\ shown by Campbell [2,3], results in the Shannon-like 

m = = 

a valid (but not necessarily optimal) solution to the problem with only the 
alphabetic constraint relaxed, that is, the Huffman-like problem. 

The approximation algorithm in Fig. [3] has a linear-time variant patterned 
after that in [11] - relying on I s — and a 0(n log n)-time variant patterned 
after [9] - - instead using Z h , those lengths obtained from solving the optimal 



1 + log 2 a 



\og 2 w(i) + log 2 \J2 w ti) 1+loS2a 
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Procedure for Finding a Near-Optimal Code 

1. Start with an optimal or near-optimal nonalphabetic code with length 
vector Z non , either the Shannon-like I s or the Huffman-like l h . 

2. Find the set of all minimal points: i such that 1 < % < n, l non (i) < 
l non (i - 1), and l non (i) < l non (i + 1); or i G + k] minimizing w(i) for 
l non (j - 1) > / non (j) = l non (j + 1) = • • • = Z non (j + k) < l non (j + k + 1). 

3. Assign a preliminary alphabetic code with lengths l prc (i) = l non (i) + 1 
for all minimal points and l prc (i) = l non (i) for all other items. The first 
codeword is / prc (l) zeros, and each additional codeword c(i) is obtained 
by either truncating c(i — 1) to l pTC (i) bits and adding 1 to the integer 
that the binary codeword represents (if l prc (i) < l prc (i — 1)) or by adding 
1 to the integer/ codeword c(i — l) and appending l prc (i) — l prc (i — 1) zeros 
(if l pTe (i) > l pre (i — 1)), defining the binary tree. 

4. Go through the code tree (with, e.g., a depth-first search), and remove any 
redundant nodes. Any node with only one child can replace the child by 
its grandchild or grandchildren. At the end of this process, an alphabetic 
code with Y%=i 2 ~' (i) = 1 is obtained. 



code tree for the Huffman-like problem. 

Every step after the first takes linear time with linear space, thus the overall 
complexity of the algorithms. Step 3 is the method by which Nakatsu showed 
that any nonalphabetic code can be made into an alphabetic code with similar 
lengths [9]. (The use of weights as a tie breaker and the nonlinearity of the 
problem do not change the validity of the algorithm.) Step 4 is the method 
by which Yeung showed that any alphabetic code can be made into another 
alphabetic code with Yh=i^~ 1 ^ — 1 without lengthening any codewords [11]. 
Thus this is a hybrid and extension of these two approaches. 

For w = (8/26,1/26,9/26,6/26,2/26) with a = 0.6, applying the Shannon- 
like version of this algorithm, we find that I s = (2,13,1,4,10), preliminary 
codeword lengths are l pre = (2,13,2,4,10), and the preliminary code tree is 
as follows: 

C = (00,0100000000000,10,1100,1101000000) 

The italicized bits are redundant, and therefore so are the corresponding 
nodes in the code tree. They are thus removed in Step 4, which means the 
final tree has lengths (2,2,2,3,3). The probability of success is a La ^"^ ~ 
0.316 (L a (w,l) 0.851), close to the optimal probability of about 0.334 
(L a (w,l*) ~ 0.843). Using the Huffman-like approximation algorithm yields 
l h = (2, 4, 1, 3, 4), a preliminary tree of lengths Z^ re = (2, 4, 2, 3, 4), and an out- 
put tree with lengths (2, 2, 2, 3, 3), which are identical to the above. The same 
probability mass function with a = 0.7 yields an optimal tree in the Huffman- 
like version. For a G (0.5,1), coding bounds follow from these approaches: 
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Theorem 2 Let 

• L*(w) be the minimized (G|) for the alphabetic problem, 

• L\{w) be that obtained using the l h -based approximation algorithm, 

• L s a (w) be that obtained using the l s -based approximation algorithm, 

• L™ a (w) = L a (w,r n ), Ll(w) = L a (wJ s ), and L h a (w) = L a (w,l h ) (using 
those I values from Fig. [3j]. 

Then 

H a (w) < L h a (w) < L» < L\{w) < 1 + L h a (w) < 2 + H a (w) (4) 
H a (w) < L h a (w) < Ll(w) < Ll(w) < 1 + L s a (w) < 2 + H a (w) (5) 

where H a (w) is the Renyi entropy for a = (1 + log 2 a) _1 : 

1 n ( n i \ 

H a {w) = \og 2 Y,w(i) a = (log a 2a) log 2 Jj w(i) 1+l °^ a 

Proof This is a corollary of Campbell's Shannon-like bounds for a > 0.5 - 
H a (w) < Lq(w) < L s a (w) < 1 + H a (w) — along with the facts that (a) the two 
approximation algorithm lengths corresponding to items 1 and n are no greater 
than those in Z non and (b) no other length exceeds the corresponding length in 
P on by 1 or more. This results in L\{w) < 1 + L\(w) and L\{w) < 1 + L s a {w) 
due to (j2J), and, since no alphabetic tree is better than the optimal alphabetic 
tree and no alphabetic tree is better than the optimal Huffman-like tree, we 
arrive at (j4j) and (jSJ). ■ 

The lower limit to L^{w) is satisfied by (2,2), while the upper limit is ap- 
proached by (e, 1 — 2e, e), which approaches entropy and penalty 2. 

Both these algorithms and the bounds due to analogous inequalities apply 
to a > 1 and to the traditional alphabetic problem (a — > 1, where Hi is 
Shannon entropy [3]). For the traditional problem, due to Step 4, the Huffman- 
based approximation version of the above algorithm is a strict improvement 
on Yeung's Huffman-based approximation [11]. 
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